knitr document van Steensel lab
TF reporter barcode processing
Introduction
18,000 TF reporters on pMT02 were transfected into mESCs, U2OS & A549, sequencing data yielded barcode counts of these experiments. These counts will be processed in this script.
Description of Data
How to make a good rendering table:
| column1 | column2 | column3 |
|---|---|---|
| 1 | 2 | 3 |
| a | b | c |
Data processing
Path, Libraries, Parameters and Useful Functions
Custom functions
Functions used thoughout this script.
Data import
Analysis
Write bc-counts into a long df
Data exploration
2: Samples with high counts usually also have a high matched barcode percentage.
3: Barcode counts diverge from pDNA data for almost all samples (except MCF7-WT-DMSO-rep2, and maybe mES_N2B27-RA_rep3) - with this we can exclude that we mainly sequenced barcode from pDNA.
4: Bc counts match with insert-seq -> The GC bias is also present here. I should possibly correct for this.
-> Based on these figures I will exclude all samples that have less than 1700 barcodes with at least 500 normalized counts.
Filtering data
# Remove replicates that do not have any valuable data
for (i in unique(bc_df$condition)) {
bc_df$n_bc[bc_df$condition == i] <- length(bc_df$barcode[bc_df$rpm > 500 & bc_df$condition == i])
}
bc_df <- bc_df[bc_df$n_bc >= 1700,] %>% select(-n_bc)
# Remove all non-matching reads
bc_df <- bc_df[!is.na(bc_df$tf),]Normalization of barcode counts:
Divide cDNA barcode counts through pDNA barcode counts, if more than 30 pDNA counts for that barcode
## [1] "progress: 1 %"
## [1] "progress: 2 %"
## [1] "progress: 3 %"
## [1] "progress: 4 %"
## [1] "progress: 5 %"
## [1] "progress: 6 %"
## [1] "progress: 6 %"
## [1] "progress: 7 %"
## [1] "progress: 7 %"
## [1] "progress: 8 %"
## [1] "progress: 8 %"
## [1] "progress: 9 %"
## [1] "progress: 10 %"
## [1] "progress: 11 %"
## [1] "progress: 12 %"
## [1] "progress: 13 %"
## [1] "progress: 14 %"
## [1] "progress: 15 %"
## [1] "progress: 16 %"
## [1] "progress: 17 %"
## [1] "progress: 18 %"
## [1] "progress: 19 %"
## [1] "progress: 20 %"
## [1] "progress: 21 %"
## [1] "progress: 21 %"
## [1] "progress: 22 %"
## [1] "progress: 22 %"
## [1] "progress: 23 %"
## [1] "progress: 24 %"
## [1] "progress: 25 %"
## [1] "progress: 26 %"
## [1] "progress: 27 %"
## [1] "progress: 28 %"
## [1] "progress: 29 %"
## [1] "progress: 30 %"
## [1] "progress: 31 %"
## [1] "progress: 32 %"
## [1] "progress: 33 %"
## [1] "progress: 34 %"
## [1] "progress: 35 %"
## [1] "progress: 35 %"
## [1] "progress: 36 %"
## [1] "progress: 36 %"
## [1] "progress: 37 %"
## [1] "progress: 37 %"
## [1] "progress: 38 %"
## [1] "progress: 39 %"
## [1] "progress: 40 %"
## [1] "progress: 41 %"
## [1] "progress: 42 %"
## [1] "progress: 43 %"
## [1] "progress: 44 %"
## [1] "progress: 45 %"
## [1] "progress: 46 %"
## [1] "progress: 47 %"
## [1] "progress: 48 %"
## [1] "progress: 49 %"
## [1] "progress: 49 %"
## [1] "progress: 50 %"
## [1] "progress: 50 %"
## [1] "progress: 51 %"
## [1] "progress: 51 %"
## [1] "progress: 52 %"
## [1] "progress: 53 %"
## [1] "progress: 54 %"
## [1] "progress: 55 %"
## [1] "progress: 56 %"
## [1] "progress: 57 %"
## [1] "progress: 58 %"
## [1] "progress: 59 %"
## [1] "progress: 60 %"
## [1] "progress: 61 %"
## [1] "progress: 62 %"
## [1] "progress: 63 %"
## [1] "progress: 63 %"
## [1] "progress: 64 %"
## [1] "progress: 64 %"
## [1] "progress: 65 %"
## [1] "progress: 65 %"
## [1] "progress: 66 %"
## [1] "progress: 67 %"
## [1] "progress: 68 %"
## [1] "progress: 69 %"
## [1] "progress: 70 %"
## [1] "progress: 71 %"
## [1] "progress: 72 %"
## [1] "progress: 73 %"
## [1] "progress: 74 %"
## [1] "progress: 75 %"
## [1] "progress: 76 %"
## [1] "progress: 77 %"
## [1] "progress: 78 %"
## [1] "progress: 78 %"
## [1] "progress: 79 %"
## [1] "progress: 79 %"
## [1] "progress: 80 %"
## [1] "progress: 81 %"
## [1] "progress: 82 %"
## [1] "progress: 83 %"
## [1] "progress: 84 %"
## [1] "progress: 85 %"
## [1] "progress: 86 %"
## [1] "progress: 87 %"
## [1] "progress: 88 %"
## [1] "progress: 89 %"
## [1] "progress: 90 %"
## [1] "progress: 91 %"
## [1] "progress: 92 %"
## [1] "progress: 92 %"
## [1] "progress: 93 %"
## [1] "progress: 93 %"
## [1] "progress: 94 %"
## [1] "progress: 94 %"
## [1] "progress: 95 %"
## [1] "progress: 96 %"
## [1] "progress: 97 %"
## [1] "progress: 98 %"
## [1] "progress: 99 %"
## [1] "progress: 100 %"
Calculate mean activity - filter out outlier barcodes
Filter out data using pDNA-insert-seq data
pDNA data seems ambigous - data doesn’t match sometimes - exclude for now
Add pDNA data
Annotate controls
Scale data to 1, use negative controls as reference
Calculate correlations between technical replicates
Data quality plots - correlation between replicates
Summarize replicates: summmarize activity values between biological replicates by their geometric mean
Exporting data
Session Info
## [1] "Run time: 6.699888 mins"
## [1] "/DATA/usr/m.trauernicht/projects/SuRE_deep_scan_trp53_gr/stimulation_1"
## [1] "Mon Nov 23 14:05:03 2020"
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.7 LTS
##
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] pheatmap_1.0.12 gridExtra_2.3 RColorBrewer_1.1-2
## [4] readr_1.3.1 haven_2.2.0 ggbeeswarm_0.6.0
## [7] plotly_4.9.2.1 tibble_3.0.1 dplyr_0.8.5
## [10] vwr_0.3.0 latticeExtra_0.6-29 lattice_0.20-38
## [13] stringdist_0.9.5.5 GGally_1.5.0 ggpubr_0.2.5
## [16] magrittr_1.5 ggplot2_3.3.0 stringr_1.4.0
## [19] plyr_1.8.6 data.table_1.12.8
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.5 tidyr_1.0.0 png_0.1-7 assertthat_0.2.1
## [5] digest_0.6.27 mime_0.9 R6_2.5.0 evaluate_0.14
## [9] httr_1.4.1 pillar_1.4.3 rlang_0.4.8 lazyeval_0.2.2
## [13] Matrix_1.2-18 rmarkdown_2.5 splines_3.6.3 labeling_0.3
## [17] htmlwidgets_1.5.2 munsell_0.5.0 shiny_1.4.0 httpuv_1.5.4
## [21] compiler_3.6.3 vipor_0.4.5 xfun_0.19 pkgconfig_2.0.3
## [25] mgcv_1.8-31 htmltools_0.5.0 tidyselect_1.1.0 reshape_0.8.8
## [29] viridisLite_0.3.0 later_1.1.0.1 crayon_1.3.4 withr_2.1.2
## [33] grid_3.6.3 nlme_3.1-143 xtable_1.8-4 jsonlite_1.7.1
## [37] gtable_0.3.0 lifecycle_0.2.0 scales_1.1.0 stringi_1.5.3
## [41] farver_2.0.1 ggsignif_0.6.0 reshape2_1.4.4 promises_1.1.1
## [45] ellipsis_0.3.0 vctrs_0.2.4 tools_3.6.3 forcats_0.4.0
## [49] glue_1.4.2 beeswarm_0.2.3 purrr_0.3.3 hms_0.5.3
## [53] crosstalk_1.0.0 jpeg_0.1-8.1 prettydoc_0.4.0 fastmap_1.0.1
## [57] yaml_2.2.1 colorspace_1.4-1 knitr_1.30